Extended Matching Commands
Previous Back to contents Next

Besides the normal meta-characters, Proxomitron also features special matching commands. Looking a bit like function calls, they work to extend the normal matching rules and add all sorts of useful abilities that would be hard or impossible to do with meta-characters alone. The basic format looks a bit like so...

$COMMAND(Parameter1, Parameter2, ...)

Although the commands begin with a "$", this character alone does not have special meaning. Normally you can use it in a match without having to escape it. The only exception would be if you were matching text that actually looked like an existing command.

One last note: not all commands can be used in all places. Some function only in the match, others only in the replace (and a few in both). Likewise some commands work only in header filters while others are designed to be used in web filters. Any such restrictions will be mentioned in the command descriptions below.


Matching Command Reference

Now without further ado (Ok, just one more ado - Ado! There - got that out of my system now) Here's what you've all been waiting for - A (hopefully) complete listing of all matching commands, their parameters (if any), and their basic use.

Quick Jump: click a link to jump to the command you're looking for or browse below:

$AV $AVQ $CON $ESC $FILTER $IHDR $INEST $JUMP $LST $NEST $OHDR $RDIR $SET $SETPROXY $STOP $TYPE $UESC $URL $USEPROXY

$AV(match)

Restrictions: Match only
Filter Types: IN Headers, OUT Headers, or Match

This is used to match any attribute's value. It first parses and isolates the value - automatically taking things like quotes vs. no quotes into account. The match within the command is then limited to just the attribute value. Note: Any quotes surrounding the value will not be part of the match.

For example, to match any image with the word "Gargalishous!" in the alt tag, you could use...

<img * alt=$AV(*gargalishous!*) *>

which would work for any of the following...

<img src="foo" alt="My is this trout ever Gargalishous!">
<img src="foo" alt='Gee your hair is Gargalishous! Is that bison flavor?'>
<img src="foo" alt=JustRawGargalishous! >

Even though the match doesn't include the quotes, they'll still be consumed by the command. This means if you want to capture the entire value including quotes you could use a match like...

<img * alt=($AV(\1))\2 *>

Here \2 will contain the full tag with its original quotes, while \1 will just contain the raw value itself. For example given...

<img src="foo" alt="Move all Zig!">

\1 = Move all Zig!
\2 = "Move all Zig!"

But there's also another way to do this - just use $AVQ()

$AVQ(match)

Restrictions: Match only
Filter Types: All

This is exactly like $AV(...) except it also includes any quotes in the match. Useful when you just want to capture an attribute's value like so...

<img * alt=$AVQ(\1) * >

which would capture any alt value into \1.

$LST(blockfile name)

Restrictions: None
Filter Types: All

This Is used to include a blockfile in any matching expression. The contents ofthe blockfile are tested line by line against the text to be matched until a match is found. Otherwise the expression returns false.

$NEST(start match, [inner match,] end match)

Restrictions: Match only
Filter Types: All (but mainly web)

The $NEST command can be used to find the corosponding ending tag or character for a given starting tag or character even when the same tag may be nested within. To use nest you must give it a "start match" which will match the opening tag and an "end match" which will match the closing tag. For example, to match nested <TABLE> tags you might use...

$NEST(<table*>,</table>)

Given the following text, it would match the area in red...

...some HTML...
<table name="outer table">
  ...
  <table name="inner table">
    ...
  </table>
  ...
</table>
...some more HTML...

Notice it manages to find the correct ending tag for the outer table even though there's an innter table also using the same tags.

$NEST can also have an optional third "inner match" parameter. If present this match will be applied to the area of text withing the starting and ending tags matched. It's important to note this doesn't include the actual text in the starting or ending tag, only what's between them. Again using the example above...

$NEST(<table*>,\1,</table>)

Given the following text the "\1" would only match the area in blue...

...some HTML...
<table name="outer table">
  ...
  <table name="inner table">
    ...
  </table>
  ...
</table>
...some more HTML...

$INEST(start match, [inner match], end match)

Restrictions: Match only
Filter Types: All (but mainly web)

$INEST ("Inner Nest") works just like $NEST above except that the initial starting tag and ending tag are located outside the command. In other words, it assumes you've already found the tag your looking for and are only interested in discovering its end. Again the example from $NEST above might look like this...

<table name=$AV(*outer*) >$INEST(<table*>,</table>) </table>

Given the following text, it would match the area in red...

...some HTML...
<table name="outer table">
  ...
  <table name="inner table">
    ...
  </table>
  ...
</table>
...some more HTML...

the advantage here is it makes it easier to look for a particular starting tag (in this case a table with "outer" in the name) as opposed to just any starting tag of that type. It would be hard to do this with $NEST alone since any check in the "start match" section would have to be true not only for the outer nested table but the inner ones as well.

$SET(\# or \0-\9=Value)

Restrictions: Match only
Filter Types: All

Use to set a positional variable to a specific value. Any replacement text, including other variables, can be set entered here. The first parameter is a positional variable \0 through \9 or the replacement stack variable \# (although the "\" is optional). Next is an equal followed by the value to be set. For example...

Set \1 equal to "foobar": $SET(1=foobar)

Set variable \1 to print the contents of \2: $SET(1=Two is \2)

By placing $SET commands within a matching expression, you can set various values *only* if the matching expression reaches that point. This can be used for an if/then/else effect...

Match: name=(one $SET(0=Naoko) | two $SET(0=Atsuko)
              | three $SET(0=Michie) | $SET(0=Default))
Replace: "\0 Matched"

will produce the following results...

if name=one then "Naoko Matched"
if name=two then "Atsuko Matched"
if name=three then "Michie Matched"
else "Default matched"

The set command has some limitations: The value a variable is set to isn't "expanded" until it's actually called in the replacement text. This means if \1 is "fish" and you use a SET command like $SET(\2=\1 food), the \2 will not become "fish food" but will be literally set to "\1 food". However this will be expanded to "fish food" when \2 is printed in the replacement section. Why is this important? Well, for one thing it means you can't set a variable to include part of itself as in $SET(\1=something and \1).

$CON:(x,y[,z])

Restrictions: Match only
Filter Types: All

CON will be true only if the current connection number is 'x' of 'y' (optionally for every 'z' connections). It can be used to rotate values based on connection. The following for example will rotate between three values in \0 ...

($CON(1,3) $SET(0=Value one of three)|
$CON(2,3) $SET(0=Value two of three)|
$CON(3,3) $SET(0=Value three of three))

use the 'z' option to delay the rotation to every so many connections.

$IHDR(header-name:matching)
$OHDR(header-name:matching)

Restrictions: Match only
Filter Types: All

$OHDR and $IHDR Can be use to test the value of any outgoing or incoming HTTP header respectively. First include the specific header to test (no wildcards) followed by a match for the header's value (wildcards are allowed here). The command will be true only if the named header's value matches the 'matching' section. $OHDR tests outgoing headers while $IHDR tests incoming headers. For example this will only match is the "Referer" header contains 'microsoft.com'

$OHDR(Referer:*.microsoft.com)

Using these you can have web filter only match if specific header values are also true, or to capture and use header values into a variables to use in a filter's replacement. You can use also them in HTTP header filters to check combinations of headers for a match.

$URL(matching value)

Restrictions: Match only
Filter Types: All

$URL can be used to test the URL inside the matching portion of a filter. Normally you would use the filter's URL match for this, but by using this command you can check for different URL based on the text matched. It's also useful to capture portions of a URL into variables. The following would capture the URL's path...

$URL(www.somehost.com/\1)

As elsewhere, the URL matching starts directly with the hostname so there's no need to match the "http://" portion.

$TYPE(code)

Restrictions: Match only
Filter Types: Web filters

Content Type check command. This command can be used to limit a filter to only affect certain types of pages (like JavaScript files only). The "code" must be one of the following known types...

htm- Web pages
css- Cascading style sheets
js- JavaScript
vbs- VB Script
oth- Anything else

This is can be useful for web filters especially in their URL match. Its value is undefined in header filters (since the content-type may not be known yet). Keep in mind it's a fast and simple check. For more complex content-type checks you can also use "$IHDR(Content-Type: ... )" where "..." is any matching expression including wildcards.

$RDIR(http://some.url.com/)

Restrictions: Match or replace
Filter Types: IN or OUT header filters only

The $RDIR (redirect) command is used to transparently redirect URLs to a different location. It's also possible to redirect to a local file by using the "http://file//filename" URL command syntax. The new URL must be of a type Proxomitron understands (http, or with SSLeay, https).

Use both $RDIR and $JUMP (see below) commands in the replacement section of header filters only. It's important to note that for outgoing headers the redirect will happen before the original site is ever contacted, but when used with incoming headers, the initial site must be contacted first. These commands have no effect in web filters since by this point the original page has already begun loading into your browser. In such cases you can often use JavaScript to change to a new location as in...

<script> document.location="http://some.new.url/"; </script>

$JUMP(http://some.url.com/)

Restrictions: Match or replace
Filter Types: IN or OUT header filters only

Similar to the $RDIR command, the $JUMP command can be used to redirect a URL to a different location. However instead of transparent redirection this works by just telling your browser to go to that new location. It's more like using a refresh "meta" tag or setting document.location in a JavaScript (it actually send a 302 redirect command to the browser).

With $JUMP your browser is aware of the redirection and the URL you see will be updated to reflect the new location. It works best for redirecting an entire page, while $RDIR is better at invisibly redirecting individual page elements like images or java apps. Use $RDIR when you want the redirection to happen "behind the scenes" and use $JUMP when you want to simply go to a different site from the one requested.

$STOP()

Restrictions: None
Filter Types: All

$STOP is a very simple command. If encountered in either the match or replace of a filter $STOP will turn that filter off for the rest of the page/connection. The current match will be allowed to complete, but once that happens, no further matching will be done.

This can be very useful for filters you only want to match once. Especially ones that insert something into a page at a given point. For example, say you wanted to insert a small script after the <BODY> tag. You could use...

Match:<body\s \1>
Replace:<body \1>
<script>my script</script>
$STOP( )

Not only does this insure the script will only be inserted once, but it also speeds things up since the filter doesn't have to waste time looking anymore.

$FILTER(True or False boolean value)

Restrictions: Match or replace
Filter Types: IN or OUT header filters only

The $FILTER command can be used to force a particular request to be filtered or not filtered regardless of it's type. Normally only specific types are filtered (like text/html, text/css, image/gif, etc). $FILTER can be used in the match or replace of any header filter and takes a "true" or "false" value. If true, the request will be run through the web filters regardless of it's type. Beware this only makes sense for content that's text based.

You can also use it to avoid "freezing" certain GIF images by using it in a header filter along with a URL match.

Take for example...

Out = "True"
Key="URL: Don't freeze this gif"
URL="www.somewhere.com/animate_me.gif"
Replace="$FILTER(False)"

$USEPROXY(True or False boolean value)

Restrictions: Match or replace
Filter Types: OUT header filters only

The $USEPROXY command also takes a "true" or "false" boolean value and can override the "Use remote proxy" check box for a given connection either turning the proxy on or off. It can be use to ensure a proxy is or isn't used for with a given site or for a particular action.

To have effect this command must be called in either the match or replace of an *outgoing* header filter. This is because the proxy setting must be established prior to connecting to the site.

$SETPROXY(remote.proxy.name[:port])

Restrictions: Match or replace
Filter Types: OUT header filters only

The $SETPROXY command will force a connection to use a particular proxy. It overrides both the "Use remote proxy" checkbox and the current proxy chosen in the proxy selector. It's useful for insuring a particular proxy is used in a given situation or with a particular URL.

The proxy to set must be one already entered into the External Proxy Selector list. This command simply looks up and sets a proxy from that list. It's usually only necessary type the first part of the proxy name - the first proxy matched in the list will be use. The partial match must be exact though (no wildcards).

Like the previous command this command must be also called in either the match or replace of an *outgoing* header filter.

$UESC(escaped text)

Restrictions: Replace only
Filter Types: All

The $UESC command is intended to be similar to the JavaScript unescape() command. It will convert most URL escaped characters back to their ASCII form. It's useful for unescaping URLs that may be embedded in other URLs (an increasingly common trick used by many sites to track the links you click). Often characters like ":" and "/" will be escaped by their hex equivilents ("%3A" and "%2F") making the real URL hard to use.

$UESC can be used in the replacement text of a filter, and can be given any valid replacement text as input (such as \1 variables). It will convert most escaped characters back to their correct form, but spaces and any non-displayable ASCII characters will remain escaped.

$ESC(any text)

Restrictions: Replace only
Filter Types: All

The $ESC command is the reverse of $UESC command. Similar in function to the JavaScript escape() command, it converts most non-alphanumeric characters into hexadecimal escape codes (of the form %xx) making them safe for inclusion as part of a URL.

$ESC can be used in the replacement text of a filter, and can be given any valid replacement text as input (such as \1 variables).


Return to main index